Robust multilingual statistical morphological generation models

نویسندگان

  • Ondrej Dusek
  • Filip Jurcícek
چکیده

We present a novel method of statistical morphological generation, i.e. the prediction of inflected word forms given lemma, part-of-speech and morphological features, aimed at robustness to unseen inputs. Our system uses a trainable classifier to predict “edit scripts” that are then used to transform lemmas into inflected word forms. Suffixes of lemmas are included as features to achieve robustness. We evaluate our system on 6 languages with a varying degree of morphological richness. The results show that the system is able to learn most morphological phenomena and generalize to unseen inputs, producing significantly better results than a dictionarybased baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Robust Multilingual Dependency Parsing with a Generative Latent Variable Model

We use a generative history-based model to predict the most likely derivation of a dependency parse. Our probabilistic model is based on Incremental Sigmoid Belief Networks, a recently proposed class of latent variable models for structure prediction. Their ability to automatically induce features results in multilingual parsing which is robust enough to achieve accuracy well above the average ...

متن کامل

Adventures in Multilingual Parsing

The typological diversity of the world’s languages poses important challenges for the techniques used in machine translation, syntactic parsing and other areas of natural language processing. Statistical models developed and tuned for English do not necessarily perform well for richly inflected languages, where larger morphological paradigms and more flexible word order gives rise to data spars...

متن کامل

Robust and adaptive architecture for multilingual spoken dialogue systems

We present how robustness and adaptivity can be supported by the spoken dialogue system architecture. AthosMail is a multilingual spoken dialogue system for e-mail domain. It is being developed in the EU-funded DUMAS project. It has flexible system architecture supporting multiple components for input interpretation, dialogue management and output generation. In addition to language differences...

متن کامل

Robust Power Control of Microgrid based on Hybrid Renewable Power Generation Systems

This paper presents modeling and control of a hybrid distributed energy sources including photovoltaic (PV), fuel cell (FC) and battery energy storage (BES) in a microgrid which provides both real and reactive power to support an unbalanced utility grid. The overall configuration of the microgrid including dynamic models for the PV, FC, BES and its power electronic interfacing are briefly descr...

متن کامل

Génération de phrases multilingues par apprentissage automatique de modèles de phrases. (Multilingual Natural Language Generation using sentence models learned from corpora)

Multilingual Natural Language Generation using sentence models learned from corpora Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system. In this thesis report, we present an architecture of NLG system relying on statistical methods. The originality of our proposition is its ability to use a corpus as a lea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013